Search CORE

60 research outputs found

Analysis of requirements on biobank and study workflows

Author: Morris Swertz
Ondřej Vojtíšek
Petr Holub
Pieter Neerincx
Publication venue
Publication date
Field of study

Human biobanks, which are the core of BBMRI-ERIC medical research infrastructure, are repositories of biological material and data associated with the research participants (donors or patients willing to participate in the research). The associated data covers a broad range of data types: from data collected directly from the research participants and medical processes related to them, to data generated from the biological material. This document focuses on describing biobank data processing workflows that were selected for piloting in EGI-Engage the biobanks by the BBMRI.nl and BBMRI.cz (national nodes of BBMRI-ERIC) together with their associated biobanks. The main focus is on proteomics and genomics workflows, which cover both extremes of privacy-sensitive data processing spectrum: from relatively non-sensitive applications to very sensitive ones

ZENODO

OligoRAP – an Oligo Re-Annotation Pipeline to improve annotation and estimate target specificity

Author: Breit Timo M
Groenen Martien AM
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Rauwerda Han
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background - High throughput gene expression studies using oligonucleotide microarrays depend on the specificity of each oligonucleotide (oligo or probe) for its target gene. However, target specific probes can only be designed when a reference genome of the species at hand were completely sequenced, when this genome were completely annotated and when the genetic variation of the sampled individuals were completely known. Unfortunately there is not a single species for which such a complete data set is available. Therefore, it is important that probe annotation can be updated frequently for optimal interpretation of microarray experiments. Results - In this paper we present OligoRAP, a pipeline to automatically update the annotation of oligo libraries and estimate oligo target specificity. OligoRAP uses a reference genome assembly with Ensembl and Entrez Gene annotation supplemented with a set of unmapped transcripts derived from RefSeq and UniGene to handle assembly gaps. OligoRAP produces alignments of each oligo with the reference assembly as well as with unmapped transcripts. These alignments are re-mapped to the annotation sources, which results in a concise, as complete as possible and up-to-date annotation of the oligo library. The building blocks of this pipeline are BioMoby web services creating a highly modular and distributed system with a robust, remote programmatic interface. OligoRAP was used to update the annotation for a subset of 791 oligos from the ARK-Genomics 20 K chicken array, which were selected as starting material for the oligo annotation session of the EADGENE/SABRE Post-analysis workshop. Based on the updated annotation about one third of these oligos is problematic with regard to target specificity. In addition, the accession numbers or ids the oligos were originally designed for no longer exist in the updated annotation for almost half of the oligos. Conclusion - As microarrays are designed on incomplete data, it is important to update probe annotation and check target specificity regularly. OligoRAP provides both and due to its design based on BioMoby web services it can easily be embedded as an oligo annotation engine in customised applications for microarray data analysis. The dramatic difference in updated annotation and target specificity for the ARK-Genomics 20 K chicken array as compared to the original data emphasises the need for regular updates

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Microarray data mining using Bioconductor packages

Author: Bicciato Silvio
Ferrari Francesco
Groenen Martien AM
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Poel Jan van der
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

This article is available from

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Wageningen University & Research Publications

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Comparison of three microarray probe annotation pipelines: differences in strategies and their effect on downstream analysis

Author: Casel Pierrot
Groenen Martien AM
Klopp Christophe
Leunissen Jack AM
Neerincx Pieter BT
Nie Haisheng
Prickett Dennis
Watson Michael
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background - Reliable annotation linking oligonucleotide probes to target genes is essential for functional biological analysis of microarray experiments. We used the IMAD, OligoRAP and sigReannot pipelines to update the annotation for the ARK-Genomics Chicken 20 K array as part of a joined EADGENE/SABRE workshop. In this manuscript we compare their annotation strategies and results. Furthermore, we analyse the effect of differences in updated annotation on functional analysis for an experiment involving Eimeria infected chickens and finally we propose guidelines for optimal annotation strategies. Results - IMAD, OligoRAP and sigReannot update both annotation and estimated target specificity. The 3 pipelines can assign oligos to target specificity categories although with varying degrees of resolution. Target specificity is judged based on the amount and type of oligo versus target-gene alignments (hits), which are determined by filter thresholds that users can adjust based on their experimental conditions. Linking oligos to annotation on the other hand is based on rigid rules, which differ between pipelines. For 52.7% of the oligos from a subset selected for in depth comparison all pipelines linked to one or more Ensembl genes with consensus on 44.0%. In 31.0% of the cases none of the pipelines could assign an Ensembl gene to an oligo and for the remaining 16.3% the coverage differed between pipelines. Differences in updated annotation were mainly due to different thresholds for hybridisation potential filtering of oligo versus target-gene alignments and different policies for expanding annotation using indirect links. The differences in updated annotation packages had a significant effect on GO term enrichment analysis with consensus on only 67.2% of the enriched terms. Conclusion - In addition to flexible thresholds to determine target specificity, annotation tools should provide metadata describing the relationships between oligos and the annotation assigned to them. These relationships can then be used to judge the varying degrees of reliability allowing users to fine-tune the balance between reliability and coverage. This is important as it can have a significant effect on functional microarray analysis as exemplified by the lack of consensus on almost one third of the terms found with GO term enrichment analysis based on updated IMAD, OligoRAP or sigReannot annotatio

Crossref

Springer

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

Wageningen University & Research Publications

Using R in Taverna: RShell v1.2

Author: Breit Timo M
Leunissen Jack AM
Neerincx Pieter BT
Nijholt Anton
Rauwerda Han
Vet Paul E van der
Wassink Ingo
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: R is the statistical language commonly used by many life scientists in (omics) data by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical.\ud \ud Findings: We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv) Syntax highlighting of the R code; v) Improved usability through fewer port types. Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly.\ud \ud Conclusion: Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Twente Research Information

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Publisher Correction:Germline de novo mutation clusters arise during oocyte aging in genomic regions with high double-strand-break incidence (Nature Genetics, (2018), 50, 4, (487-492), 10.1038/s41588-018-0071-6)

Author: Bodian Dale L.
Deeken John F.
Gilissen Christian
Goldmann Jakob M.
Neerincx Pieter B.
Niederhuber John E.
Seplyarskiy Vladimir B.
Solomon Benjamin D.
Veltman Joris A.
Vilboux Thierry
Wong Wendy S.W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2021
Field of study

In the HTML version of the article originally published, the figures for Supplementary Figures 1–15 were incorrect and did not match the correct figures in the PDF of Supplementary Text and Figures. The error has been corrected in the HTML version of the article

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Gene Expression in Chicken Reveals Correlation with Structural Genomic Features and Conserved Patterns of Transcription in the Terrestrial Vertebrates

Author: Aart Lammers
AE Vinogradov
AJ Hulbert
AM Boutanaev
BY Liao
CI Castillo-Davis
Darren P. Martin
DK Kim
DK Kim
E Eisenberg
ET Chan
Evert M. van Schothorst
GK Smyth
H Caron
H Nie
Haisheng Nie
Hendrik-Jan Megens
Jaap Keijer
Jack A. M. Leunissen
M Kimura
Martien A. M. Groenen
P Khaitovich
PB Neerincx
Pieter B. T. Neerincx
RC Gentleman
Richard P. M. A. Crooijmans
RW Morgan
S Durinck
S Falcon
S van Hemert
S van Hemert
T Mijalski
W Huber
W Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background - The chicken is an important agricultural and avian-model species. A survey of gene expression in a range of different tissues will provide a benchmark for understanding expression levels under normal physiological conditions in birds. With expression data for birds being very scant, this benchmark is of particular interest for comparative expression analysis among various terrestrial vertebrates. Methodology/Principal Findings - We carried out a gene expression survey in eight major chicken tissues using whole genome microarrays. A global picture of gene expression is presented for the eight tissues, and tissue specific as well as common gene expression were identified. A Gene Ontology (GO) term enrichment analysis showed that tissue-specific genes are enriched with GO terms reflecting the physiological functions of the specific tissue, and housekeeping genes are enriched with GO terms related to essential biological functions. Comparisons of structural genomic features between tissue-specific genes and housekeeping genes show that housekeeping genes are more compact. Specifically, coding sequence and particularly introns are shorter than genes that display more variation in expression between tissues, and in addition intergenic space was also shorter. Meanwhile, housekeeping genes are more likely to co-localize with other abundantly or highly expressed genes on the same chromosomal regions. Furthermore, comparisons of gene expression in a panel of five common tissues between birds, mammals and amphibians showed that the expression patterns across tissues are highly similar for orthologuous genes compared to random gene pairs within each pair-wise comparison, indicating a high degree of functional conservation in gene expression among terrestrial vertebrates. Conclusions - The housekeeping genes identified in this study have shorter gene length, shorter coding sequence length, shorter introns, and shorter intergenic regions, there seems to be selection pressure on economy in genes with a wide tissue distribution, i.e. these genes are more compact. A comparative analysis showed that the expression patterns of orthologous genes are conserved in the terrestrial vertebrates during evolutio

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Wageningen University & Research Publications

Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project

Author: Andreu-Sánchez Sergio
Bolte Laura A
Brandao Gois Milla F
Chen Lianmin
Collij Valerie
Fu Jingyuan
Gacesa Ranko
Harmsen Hermie J M
Hu Shixian
Klaassen Marjolein A Y
Kurilshikov Alexander
Lopera-Maya Esteban A
Neerincx Pieter B T
Sanna Serena
Sinha Trishla
Swertz Morris A
van der Graaf Adriaan
Vila Arnau Vich
Weersma Rinse K
Wijmenga Cisca
Zhernakova Alexandra
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2022
Field of study

Host genetics are known to influence the gut microbiome, yet their role remains poorly understood. To robustly characterize these effects, we performed a genome-wide association study of 207 taxa and 205 pathways representing microbial composition and function in 7,738 participants of the Dutch Microbiome Project. Two robust, study-wide significant (P < 1.89 × 10-10) signals near the LCT and ABO genes were found to be associated with multiple microbial taxa and pathways and were replicated in two independent cohorts. The LCT locus associations seemed modulated by lactose intake, whereas those at ABO could be explained by participant secretor status determined by their FUT2 genotype. Twenty-two other loci showed suggestive evidence (P < 5 × 10-8) of association with microbial taxa and pathways. At a more lenient threshold, the number of loci we identified strongly correlated with trait heritability, suggesting that much larger sample sizes are needed to elucidate the remaining effects of host genetics on the gut microbiome

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Computational pan-genomics: status, promises and challenges

Author: Abeel Thomas
Alkan Can
Baaijens Jasmijn
Bakker Paul
Boeva Valentina
Bonnal Raoul
Chiaromonte Francesca
Chikhi Rayan
Ciccarelli Francesca
Cijvat Robin
Datema Erwin
Dijkstra Louis
Duijn Cornelia
Dutilh Bas
Eichler Evan
El-Kebir Mohammed
Ernst Corinna
Eskin Eleazar
Garrison Erik
Ghaffaari Ali
Guryev Victor
Kersey Paul
Klau Gunnar
Kloosterman Wigard
Korbel Jan
Lameijer Eric-Wubbo
Langmead Benjamin
Marschall Tobias
Martin Marcel
Marz Manja
Medvedev Paul
Mu John
Mäkinen Veli
Neerincx Pieter
Novak Adam
Ouwens Klaasjan
Paten Benedict
Peterlongo Pierre
Pisanti Nadia
Porubsky David
Rahmann Sven
Raphael Benjamin
Reinert Knut
Ridder Dick
Ridder Jeroen
Rivals Eric
Sanders Ashley
Schlesner Matthias
Schulz-Trieglaff Ole
Schönhuth Alexander
Sheikhizadeh Siavash
Shneider Carl
Smit Sandra
The Computational Pan-Genomics Consortium
Valenzuela Daniel
Vandin Fabio
Wang Jiayin
Wessels Lodewyk
Ye Kai
Zhang Ying
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2018
Field of study

International audienceMany disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains

INRIA a CCSD electronic archive server

Archivio della Ricerca - Università di Pisa

EUR Research Repository

HAL-MINES ParisTech

Archivio della ricerca della Scuola Superiore Sant'Anna

Radboud Repository

HAL-Rennes 1

The Genome of the Netherlands:design, and project goals

Author: Abdellaoui Abdel
Beekman Marian
Boomsma Dorret I.
Byelas Heorhiy
Cao Hongzhi
Cao Sujie
Chen Ruoyan
de Bakker Paul I. W.
de Craen Anton J. M.
de Knijff Peter
Deelen Patrick
den Dunnen Johan T.
Dijkstra Martijn
Du Yuanping
Elbers Clara C.
Estrada Karol
Francioli Laurent C.
Guryev Victor
Hehir-Kwa Jayne Y.
Hofman Albert
Hottenga Jouke Jan
Houwing-Duistermaat Jeanine
Kanterakis Alexandros
Karssen Lennart C.
Kattenberg Mathijs
Koval Vyacheslav
Laros Jeroen F. J.
Li Ning
Li Qibin
Li Yingrui
Mai Hailiang
Menelaou Androniki
Neerincx Pieter B. T.
Oostra Ben
Pulit Sara L.
Rivadeneira Fernanodo
Slagboom Eline P.
Suchiman Eka H. D.
Swertz Morris A.
Uitterlinden Andre G.
van Dijk Freerk
van Duijn Cornelia M.
van Enckevort David
van Leeuwen Elisabeth M.
van Ommen Gert-Jan
van Setten Jessica
Vermaat Martijn
Wang Jun
Wijmenga Cisca
Willemsen Gonneke
Wolffenbuttel Bruce H.
Ye Kai
Publication venue
Publication date: 29/05/2013
Field of study

Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean = 53 years; SD = 16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project.</p

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

PubMed Central

Copenhagen University Research Information System

Dissertations of the University of Groningen